{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A whole workflow in production environment after my forecaster is developed\n",
    "\n",
    "In model developing process, `TSDataset` is used to preprocess(including feature engineering, data sampling, scaling, ...) the raw data the postprocess the predicted result(majorly unscaling). This post provides a way by which users could replay the preprocessing and postprocessing in production environment(e.g. model serving).\n",
    "\n",
    "In this guide, we will\n",
    "\n",
    "1. Train a TCNForecaster with nyc_taxi dataset and export the model in onnx type and save the scaler.\n",
    "2. Show users how to replay the preprocessing and postprocessing in production environment.\n",
    "3. Evaluate the performance of preprocessing and postprocessing\n",
    "4. More tips about this topic."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecaster developing\n",
    "\n",
    "First let's prepare the data. We will manually download the data to show the details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run following\n",
    "!wget https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we may load the data to pandas dataframe and carry out preprocessing through `TSDataset`. You could refer to\n",
    "\n",
    "- [How to preprocess my own data](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_preprocess_my_data.html)\n",
    "- [How to sample data through sliding window]()\n",
    "\n",
    "for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "from bigdl.chronos.data import TSDataset, get_public_dataset\n",
    "import pandas as pd\n",
    "\n",
    "# load the data to pandas dataframe\n",
    "df = pd.read_csv(\"nyc_taxi.csv\", parse_dates=[\"timestamp\"])\n",
    "\n",
    "# use nyc_taxi public dataset\n",
    "train_data, _, test_data = TSDataset.from_pandas(df,\n",
    "                                                 dt_col=\"timestamp\",\n",
    "                                                 target_col=\"value\",\n",
    "                                                 repair=False,\n",
    "                                                 with_split=True,\n",
    "                                                 test_ratio=0.1)\n",
    "\n",
    "# create a scaler for data scaling\n",
    "scaler = StandardScaler()\n",
    "\n",
    "# preprocess(generate datetime feature, scale and roll samping)\n",
    "for data in [train_data, test_data]:\n",
    "    data.gen_dt_feature(features=[\"WEEKDAY\", \"HOUR\", \"MINUTES\"])\\\n",
    "        .scale(scaler, fit=(data is train_data))\\\n",
    "        .roll(lookback=48, horizon=24)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Developing a forecaster on this data is quite easy. You may refer to other how-to guide for more detailed information.\n",
    "\n",
    "- [How to create a Forecaster](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_create_forecaster.html)\n",
    "- [Train forcaster on single node](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.chronos.forecaster import TCNForecaster  # TCN is algorithm name\n",
    "\n",
    "# create a forecaster\n",
    "forecaster = TCNForecaster.from_tsdataset(train_data)\n",
    "\n",
    "# train the forecaster\n",
    "forecaster.fit(train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Forecaster` API is for quick iteration during the development, when a forecaster is developed with satisfying accuracy and performance, users may prefer to export the model to formats that are easier to deploy in production environment (e.g., ONNX, openVINO, torchscript, ...). We choose to use ONNX here as an example. You may refer to other how to guides for more details.\n",
    "\n",
    "- [Export the ONNX model files to disk](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_export_onnx_files.html)\n",
    "- [Export the OpenVINO model files to disk](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_export_openvino_files.html)\n",
    "- [Export the TorchScript model files to disk](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_export_torchscript_files.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the forecaster in onnx type\n",
    "forecaster.export_onnx_file(dirname=\"nyc_taxi_onnx_model\", quantized_dirname=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If a scaler is used during the preprocessing process, then it should be saved for production environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "# save the scaler\n",
    "# There are many ways, we use pickle here\n",
    "with open('scaler.pkl','wb') as f:\n",
    "    pickle.dump(scaler, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In production environment\n",
    "When you have successfully developed a model your are really satisfied with, then it's time to deploy your model. With the highly optimized model you just exported, the whole process could be suprising easy.\n",
    "\n",
    "There are 2 possibilities for deployment:\n",
    "1. You will use the model in a monolithic application, where the input, model inference and output is located on the same server.\n",
    "2. You will use a server-client model, where you may want to adopt some model serving tools (e.g., torchserve, OpenVINO server, Triton, ...). This means that users will separate model inference with other workload.\n",
    "\n",
    "For the first choice, you may directly call some inference engine API (e.g., onnxruntime, OpenVINO, ...) in your application. For the second choice, this may depends on different model serving tools' procedure. We have [an example to serve a forecaster on torchserve](https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/serving)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For both choicies, it's common to have a single sample data to come, here we use the last sample of nyc taxi dataset as example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# generate data to predict in a local csv file\n",
    "_, _, test_data = get_public_dataset(\"nyc_taxi\")\n",
    "test_data.df[-48:].to_csv(\"inference_data.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we could load all the items we need, this includes the scaler and onnx file we just dumped, and the data to be inferenced."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import onnxruntime\n",
    "\n",
    "# load the scaler\n",
    "with open('scaler.pkl', 'rb') as f:\n",
    "    scaler = pickle.load(f)\n",
    "\n",
    "# load the onnx file to onnxruntime\n",
    "session = onnxruntime.InferenceSession(\"nyc_taxi_onnx_model/onnx_saved_model.onnx\")\n",
    "\n",
    "# load the data to be predicted\n",
    "df = pd.read_csv(\"inference_data.csv\", parse_dates=[\"timestamp\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The preprocess process should be the same as how you processed your data when developing the forecaster, except\n",
    "\n",
    "> 📝**Note**\n",
    "> \n",
    "> There are 2 exceptions here that should be followed carefully\n",
    ">\n",
    "> - Please make sure to set `deploy_mode=True` when creating TSDataset through `TSDataset.from_pandas`, `TSDataset.from_parquet` and `TSDataset.from_prometheus`, which will reduce data processing latency and set necessary parameters for deployment.\n",
    "> \n",
    "> - For `scale`, please make sure using the scaler you dumped and loaded back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_during_deployment(df, scaler):\n",
    "    tsdata = TSDataset.from_pandas(df,\n",
    "                                   dt_col=\"timestamp\",\n",
    "                                   target_col=\"value\",\n",
    "                                   repair=False,\n",
    "                                   deploy_mode=True)\n",
    "    tsdata.gen_dt_feature(features=[\"WEEKDAY\", \"HOUR\", \"MINUTES\"])\\\n",
    "          .scale(scaler)\\\n",
    "          .roll(lookback=48, horizon=24)\n",
    "    data = tsdata.to_numpy()\n",
    "    return tsdata, data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For postprocessing, if scaler is used, then `unscale_numpy` is needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def postprocess_during_deployment(data, tsdata):\n",
    "    return tsdata.unscale_numpy(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "They users could predict the data easily by a clear process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tsdata, data = preprocess_during_deployment(df, scaler)\n",
    "data = session.run(None, {'x': data})[0]\n",
    "processed_data = postprocess_during_deployment(data, tsdata)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.13 ('chronos-dev')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "96fc19a7aa36ac709fe91f44b3fc34db2191b0c55a28b77b6f5333c993443c1f"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
